Introduce eBPF, its early developement tool BCC and its problem, BPF CO-RE key tech, compilation, structure, and application life cycle.
A eBPF program is a piece of user-provided code which is injected straight into a kernel. Once loaded and verified, BPF programs execute in kernel context. These programs operate inside kernel memory space with access to all the internal kernel state available to it.
Today, eBPF is used extensively to drive a wide variety of use cases: Providing high-performance networking and load-balancing in modern data centers and cloud native environments, extracting fine-grained security observability data at low overhead, helping application developers trace applications, providing insights for performance troubleshooting, preventive application and container runtime security enforcement, and much more.
eBPF structure
If we want our BPF program to bo able to run in other envirenment (i.e, portable), we normally do “On the fly” BPF compilation. The reason is:
And so, early BPF developers use BCC tool.
With BCC, you embed your BPF program C source code into your user-space program (control application) as a plain string. When control application is eventually deployed and executed on target host, BCC invokes its embedded Clang/LLVM, pulls in local kernel headers (which you have to make sure are installed on the system from correct kernel-devel package), and performs compilation on the fly. This will make sure that memory layout that BPF program expects is exactly the same as in the target host’s running kernel.
If you have to deal with some optional and potentially compiled-out stuff in kernel, you’ll just do #ifdef/#else guarding in your source code to accommodate such hazards as renamed fields, different semantics of values, or any optional stuff not available on current configuration. Embedded Clang will happily remove irrelevant parts of your code and will tailor BPF program code to specific kernel.
BCC tools architecture
This sounds great, doesn’t it? Not quite so, unfortunately. While this workflow works, it’s not without major drawbacks.
kernel-devel package required.kernel-devel is missing internal headers. If you need something from kernel that is not exposed through public headers – you’ll need to copy/paste type definitions into your BPF code by hand to get your work done;kernel-devel can get out of syncCan we compile once? Then run same binary everywhere?
Libbpf + BPF CO-RE chose a different way. Their philosophy is that BPF programs are not much different from any “normal” user-space program: they ought to be compiled once into small binaries and then deployed unmodified in a compact form to target hosts. The goal is:
BPF CO-RE key ingredient relations
task_struct->pid field, Clang would record that it was exactly a field named “pid” of type “pid_t” residing within a struct task_struct. This is done so that even if target kernel has a task_struct layout in which “pid” field got moved to a different offset within a task_struct structure (e.g., due to extra field added before “pid” field), or even if it was moved into some nested anonymous struct or union (and this is completely transparent in C code, so no one ever pays attention to details like that), you’ll still be able to find it just by its name and type information. This is called a field offset relocation. It is possible to capture (and subsequently relocate) not just a field offset, but other field aspects, like field existence or size.#ifdef’ed source code, compiled into two separate BPF program variants, with appropriate variant picked manually by control application in runtime. All this would be just unnecessary added complexity and pain.bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
kernel-devel packagebpftool gen skeleton <app>.bpf.o > <app>.skel.h
You’ll end up with a small user-space binary that embeds compiled BPF code through BPF skeleton and has statically linked libbpf in it, so doesn’t depend on system-wide libbpf availability. The result is a small (200KB), fast, stand-alone binary that can be run everywhere. The BCC project has a collection of these, called libbpf tools.
BPF maps is a BPF concept for abstract data container. Many different things are modeled as BPF maps: from simple arrays and hash maps to per-socket and per-task local storage, BPF perf and ring buffers, and even some more exotic uses. The important thing is that most BPF maps allow looking up, updating, and deleting its elements by some key. BPF maps are the means to share the state between (potentially many) BPF programs and user-space. Define maps data structure in BPF program usually (there is a few exceptions like PERF_EVENT_ARRAY, STACK_TRACE, DEVMAP, CPUMAP, etc) looks like this:
struct {
__uint(type, BPF_MAP_TYPE_<type>); /* ARRAY, HASH, PERCPU_ARRAY, ...*/
__uint(max_entries, <max entry number>);
__type(key, <key variable type>);
__type(value, <value variable type>); /* if it's a struct, you can define it manually in <app>.h or choose one from vmlinux.h */
} <map name> SEC(".maps");
BPF CO-RE compilation process
vmlinux.h header file with all kernel typesBPF CO-RE deployment process
<app>.bpf.c: BPF C code that contain the logic which is to be executed in the kernel context. There could be many BPF programs defined within the same BPF C code file. They could have different types (i.e., SEC() annotations). You can also define multiple BPF programs with the same SEC() attribute: libbpf will handle that just fine. All BPF programs defined within the same BPF C code file share all the global state (global variable, BPF map). This is frequently utilized to coordinate few collaborating BPF programs.
const volatile marks the variable as read-only for BPF code and user-space code. Can be set and modified from user-space only before a BPF skeleton is loaded.vmlinux.h : This header contains all kernel types: those exposed as part of UAPI, internal types available through kernel-devel, and some more internal kernel types not available anywhere else. Unfortunately, BTF (as well as DWARF) doesn’t record #define macros, so some common macros might be missing with vmlinux.h. Most commonly missing ones might be provided as part of libbpf’s bpf_helpers.h (kernel-side “library”, provided by libbpf).bpf_helpers.h : provided by libbpf and contains most-often used macros, constants, and BPF helper definitions, which are used by virtually every existing BPF application. More info about this header here
bpf_map_<operation>_elem(&some_map, &keyvar[, &valuevar][, args]) : Manipulate maps in kernel. Common operations include lookup, update, delete, push, pop, peek, etc.SEC() macro defines the BPF program which will be loaded into the kernel. It’s is represented as a normal C function in a specially-named section.
SEC("tp/syscalls/sys_enter_write") int handle_tp(void *ctx) { ... } define a tracepoint BPF program, which will be called each time a write() syscall is invoked from any user-space application.char LICENSE[] SEC("license") defines the license of your BPF code. Specifying the license is mandatory and is enforced by the kernel. Like GPL, GPL v2, GPL and additional rights, Dual BSD/GPL, Dual MIT/GPL, or Dual MPL/GPL. Some BPF functionality is unavailable to non-GPL-compatible code.<app>.c: user-space C code, which loads BPF code and interacts with it throughout the lifetime of the application
bpf.h: defines various userspace bpf helpers for working with BPF programs and mapslibbpf.h: ncludes libbpf types and functions.<app>.skel.h: reflects the high-level structure of <app>.bpf.c. It also simplifies the BPF code deployment logistics by embedding contents of the compiled BPF object code inside the header file
skel->rodata for read-only variablesskel->bss for mutable zero-initialized variablesskel->data for non-zero-initialized mutable variables.libbpf_set_print() provides a custom callback for all libbpf logs. This is extremely useful, especially during active development, because it allows to capture helpful libbpf debug logsRLIMIT_MEMLOCK limit. Bumps kernel’s internal per-user memory limit to allow BPF sub-system to allocate necessary resources for your BPF programs, maps, etc. You have to bump RLIMIT_MEMLOCK limit one way or another. Doing it through setrlimit(RLIMIT_MEMLOCK, ...), which should be called at the very beginning of your program, is the simplest and the most convenient way<app>.h (optional): a header file with the common type definitions and is shared by both BPF and user-space code of the application.BPF application typically goes through the following phases (Generated BPF skeleton has corresponding functions to trigger each phase):
<name>__open_and_load() if you don’t need to adjust your BPF program before open phase.如果你可以直接安裝 Debian11 環境,那就直接裝並跳過此小節,本章節是針對 debian11 無法順利安裝時改用 debian10 升級成 debian11 的步驟說明
debian11 kernel 預設設定有 CONFIG_DEBUG_INFO=y 和 CONFIG_DEBUG_INFO_BTF=y (產生 /sys/kernel/btf/vmlinux,用來產生 vmlinux.h)
如果 debian10 還沒有進行鏡像站設定,首先要修改 /etc/apt/source.list ,註解 cdrom ... 這行,加入鏡像站位置 (例如台灣 http://ftp.tw.debian.org/debian)共四行,並且加入 non-free,完成後檔案內容像這樣:
deb http://ftp.tw.debian.org/debian/ buster main contrib non-free
deb-src http://ftp.tw.debian.org/debian/ buster main contrib non-free
deb http://security.debian.org/debian-security buster/updates main
deb-src http://security.debian.org/debian-security buster/updates main
# bullseye-updates, previously known as 'volatile'
deb http://ftp.tw.debian.org/debian/ buster-updates main contrib non-free
deb-src http://ftp.tw.debian.org/debian/ buster-updates main contrib non-free
接著執行以下指令,將 debian10 的清單換成 debian11 的
sudo sed -i 's/buster/bullseye/g' /etc/apt/sources.list
sudo sed -i 's/buster/bullseye/g' /etc/apt/sources.list.d/*
確認清單把 buster 換成 bullseye:
deb http://ftp.tw.debian.org/debian/ bullseye main contrib non-free
deb-src http://ftp.tw.debian.org/debian/ bullseye main contrib non-free
deb http://security.debian.org/debian-security bullseye/updates main
deb-src http://security.debian.org/debian-security bullseye/updates main
# bullseye-updates, previously known as 'volatile'
deb http://ftp.tw.debian.org/debian/ bullseye-updates main contrib non-free
deb-src http://ftp.tw.debian.org/debian/ bullseye-updates main contrib non-free
最後要修改 debian security,然後儲存檔案:
deb http://ftp.tw.debian.org/debian/ bullseye main contrib non-free
deb-src http://ftp.tw.debian.org/debian/ bullseye main contrib non-free
deb https://deb.debian.org/debian-security bullseye-security main contrib
deb-src https://deb.debian.org/debian-security bullseye-security main contrib
# bullseye-updates, previously known as 'volatile'
deb http://ftp.tw.debian.org/debian/ bullseye-updates main contrib non-free
deb-src http://ftp.tw.debian.org/debian/ bullseye-updates main contrib non-free
更新 repository lists
sudo apt update
先進行最小化更新,如果畫面顯示套件重新啟動的相關訊息就按 q 見跳過;如果出現在套件升級時未經詢問重啟服務的訊息你可以選擇左邊的「Yes」簡化安裝,這個過程大約花費半小時
sudo apt upgrade --without-new-pkgs
接著進行完整升級,和上面類似,這次會花費大約一個半小時
sudo apt full-upgrade
完成之後重開機,你可以清除舊套件和暫存檔案來釋放大量空間
sudo apt --purge autoremove
sudo apt autoclean
執行以下指令安裝必要套件和工具
apt-get update
apt-get install clang build-essential bpftool git libbpf-dev
首先取得範例程式碼
git clone https://github.com/sartura/ebpf-core-sample
cd ebpf-core-sample
針對 hello.bpf.c 做修改
#include "vmlinux.h"
#define BPF_NO_GLOBAL_DATA 1
#include <bpf/bpf_helpers.h>
修改完成後依序執行以下指令,忽視警告訊息,你會得到二個可以執行的檔案 hello 和 maps
# 產生 vmlinux.h 標頭檔
bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
# 編譯 hello.bpf.c 產生 hello.bpf.o 物件檔
# gcc 目前不支援 .bpc.c 的編譯,未來說不定可以
# -g
clang -g -O2 -target bpf -D__TARGET_ARCH_x86_64 -I . -c hello.bpf.c -o hello.bpf.o
# 由 hello.bpf.o 產生 hello.skel.h 標頭檔
bpftool gen skeleton hello.bpf.o > hello.skel.h
# 編譯 hello.c 產生 hello.o 物件檔
clang -g -O2 -Wall -I . -c hello.c -o hello.o
# 獲得 libbpf 原始碼
git clone https://github.com/libbpf/libbpf && cd libbpf/src/
# 將 libbpf 編譯成靜態函式庫
make BUILD_STATIC_ONLY=1 OBJDIR=../build/libbpf DESTDIR=../build INCLUDEDIR= LIBDIR= UAPIDIR= install
# 回到原本的資料夾
cd ../../
# 將 hello.o 與 libbpf 靜態函式庫連接,產生 hello 執行檔
# -lelf 和 -lz 代表 libbpf 的相依套件,必須要提供給編譯器
clang -Wall -O2 -g hello.o libbpf/build/libbpf.a -lelf -lz -o hello
# 以相同作法產生 maps 執行檔
clang -g -O2 -target bpf -D__TARGET_ARCH_x86_64 -I . -c maps.bpf.c -o maps.bpf.o
bpftool gen skeleton maps.bpf.o > maps.skel.h
clang -g -O2 -Wall -I . -c maps.c -o maps.o
clang -Wall -O2 -g maps.o libbpf/build/libbpf.a -lelf -lz -o maps
試著在 debian11 上執行 sudo ./hello 和 sudo ./maps,它們可以印出接下來的指令或是動作。確認可以正常執行後,將整個資料夾移動至 debian10 內,直接執行執行檔,會有相同的結果,也就符合 CO-RE 的精神